This notebook is a demonstration of a method to visualize Ice911's cloud-hosted data using xarray. The approach modeled here is widely used in the Pangeo open-source community, and a similar walkthrough from them can be found here.
Many climate modelers process their data using NCAR's command-line tools provided in the NCO and CDO libraries. However, these tools require interfacing with the data through a command-line interface that I find slow and prone to errors. In this demo, I show how using Zarr and xarray, two open-source python tools designed for parallel computing in the cloud, can significantly decrease data analysis time.
First, we load a few of the libraries we will be using for data intake. s3fs is a library for accessing files on the Amazon S3 cloud. zarr is a python library that deals with highly-compressed and easily-accessed files. xarray is a labeled dataset format commonly used to handle multidimensional geospatial data. xarray is built around the dask framework, which is a library for parallelizing computation in python. We will be using dask.distributed to make the multiple cores within our remote machine behave like multi-computer cluster.
With parallelization set up, we're ready to load the data. I already uploaded Zarr datastores of every simulation case to the data-viz-server S3 bucket. This data contains all of the variables and metadata from the original NetCDF data. When we run the code below, the cases dictionary will be populated with xarray datasets fetched from the cloud. This code will execute very quickly because only the metadata is fetched -- the variable data itself is only downloaded as needed.
In [3]:
case_paths={"GLOBAL":"cobalt.GLOBAL02_ZARR/","GLOBAL_MAY_NAO":"cobalt.GLOBAL02.MAY.NAO_ZARR/","GLOBAL.MAY.NAO":"cobalt.GLOBAL02.MAY.PAO_ZARR/","CONTROL":"CONTROL03_ZARR/","CONTROL_MAY_NAO":"CONTROL03.MAY.NAO_ZARR/","CONTROL_MAY_PAO":"CONTROL03.MAY.PAO_ZARR/","FRAM_MAY":"cobalt.FRAM.MAY_ZARR/","FRAM_MAY_NA0":"cobalt.FRAM.MAY.NAO_ZARR/","FRAM_MAY_PAO":"cobalt.FRAM.MAY.PAO_ZARR/"}cases={}fork,vincase_paths.items():s3_path='s3://data-viz-server/{}'.format(v)# Initialize the S3 file systems3=s3fs.S3FileSystem()store=s3fs.S3Map(root=s3_path,s3=s3,check=False)# Read Zarr file and put into cases dictionarycases[k]=xr.open_zarr(store=store,consolidated=True)
Now, this data contains all of the variables. Let's look at the CONTROL case, for instance.
In [4]:
cases["CONTROL"]
Out[4]:
<xarray.Dataset>
Dimensions: (ilev: 27, lat: 192, lev: 26, lon: 288, nbnd: 2, slat: 191, slon: 288, time: 977)
Coordinates:
* ilev (ilev) float64 2.194 4.895 9.882 18.05 ... 956.0 985.1 1e+03
* lat (lat) float64 -90.0 -89.06 -88.12 -87.17 ... 88.12 89.06 90.0
* lev (lev) float64 3.545 7.389 13.97 23.94 ... 929.6 970.6 992.6
* lon (lon) float64 0.0 1.25 2.5 3.75 ... 355.0 356.2 357.5 358.8
* slat (slat) float64 -89.53 -88.59 -87.64 ... 87.64 88.59 89.53
* slon (slon) float64 -0.625 0.625 1.875 3.125 ... 355.6 356.9 358.1
* time (time) object 2000-02-01 00:00:00 ... 2081-06-01 00:00:00
Dimensions without coordinates: nbnd
Data variables:
AEROD_v (time, lat, lon) float32 dask.array<chunksize=(1, 192, 288), meta=np.ndarray>
CLDHGH (time, lat, lon) float32 dask.array<chunksize=(1, 192, 288), meta=np.ndarray>
CLDICE (time, lev, lat, lon) float32 dask.array<chunksize=(1, 26, 192, 288), meta=np.ndarray>
CLDLIQ (time, lev, lat, lon) float32 dask.array<chunksize=(1, 26, 192, 288), meta=np.ndarray>
CLDLOW (time, lat, lon) float32 dask.array<chunksize=(1, 192, 288), meta=np.ndarray>
CLDMED (time, lat, lon) float32 dask.array<chunksize=(1, 192, 288), meta=np.ndarray>
CLDTOT (time, lat, lon) float32 dask.array<chunksize=(1, 192, 288), meta=np.ndarray>
CLOUD (time, lev, lat, lon) float32 dask.array<chunksize=(1, 26, 192, 288), meta=np.ndarray>
CONCLD (time, lev, lat, lon) float32 dask.array<chunksize=(1, 26, 192, 288), meta=np.ndarray>
DCQ (time, lev, lat, lon) float32 dask.array<chunksize=(1, 26, 192, 288), meta=np.ndarray>
DTCOND (time, lev, lat, lon) float32 dask.array<chunksize=(1, 26, 192, 288), meta=np.ndarray>
DTV (time, lev, lat, lon) float32 dask.array<chunksize=(1, 26, 192, 288), meta=np.ndarray>
EMIS (time, lev, lat, lon) float32 dask.array<chunksize=(1, 26, 192, 288), meta=np.ndarray>
FICE (time, lev, lat, lon) float32 dask.array<chunksize=(1, 26, 192, 288), meta=np.ndarray>
FLDS (time, lat, lon) float32 dask.array<chunksize=(1, 192, 288), meta=np.ndarray>
FLDSC (time, lat, lon) float32 dask.array<chunksize=(1, 192, 288), meta=np.ndarray>
FLNS (time, lat, lon) float32 dask.array<chunksize=(1, 192, 288), meta=np.ndarray>
FLNSC (time, lat, lon) float32 dask.array<chunksize=(1, 192, 288), meta=np.ndarray>
FLNT (time, lat, lon) float32 dask.array<chunksize=(1, 192, 288), meta=np.ndarray>
FLNTC (time, lat, lon) float32 dask.array<chunksize=(1, 192, 288), meta=np.ndarray>
FLUT (time, lat, lon) float32 dask.array<chunksize=(1, 192, 288), meta=np.ndarray>
FLUTC (time, lat, lon) float32 dask.array<chunksize=(1, 192, 288), meta=np.ndarray>
FSDS (time, lat, lon) float32 dask.array<chunksize=(1, 192, 288), meta=np.ndarray>
FSDSC (time, lat, lon) float32 dask.array<chunksize=(1, 192, 288), meta=np.ndarray>
FSDTOA (time, lat, lon) float32 dask.array<chunksize=(1, 192, 288), meta=np.ndarray>
FSNS (time, lat, lon) float32 dask.array<chunksize=(1, 192, 288), meta=np.ndarray>
FSNSC (time, lat, lon) float32 dask.array<chunksize=(1, 192, 288), meta=np.ndarray>
FSNT (time, lat, lon) float32 dask.array<chunksize=(1, 192, 288), meta=np.ndarray>
FSNTC (time, lat, lon) float32 dask.array<chunksize=(1, 192, 288), meta=np.ndarray>
FSNTOA (time, lat, lon) float32 dask.array<chunksize=(1, 192, 288), meta=np.ndarray>
FSNTOAC (time, lat, lon) float32 dask.array<chunksize=(1, 192, 288), meta=np.ndarray>
FSUTOA (time, lat, lon) float32 dask.array<chunksize=(1, 192, 288), meta=np.ndarray>
ICEFRAC (time, lat, lon) float32 dask.array<chunksize=(1, 192, 288), meta=np.ndarray>
ICIMR (time, lev, lat, lon) float32 dask.array<chunksize=(1, 26, 192, 288), meta=np.ndarray>
ICWMR (time, lev, lat, lon) float32 dask.array<chunksize=(1, 26, 192, 288), meta=np.ndarray>
LANDFRAC (time, lat, lon) float32 dask.array<chunksize=(1, 192, 288), meta=np.ndarray>
LHFLX (time, lat, lon) float32 dask.array<chunksize=(1, 192, 288), meta=np.ndarray>
LWCF (time, lat, lon) float32 dask.array<chunksize=(1, 192, 288), meta=np.ndarray>
OCNFRAC (time, lat, lon) float32 dask.array<chunksize=(1, 192, 288), meta=np.ndarray>
OMEGA (time, lev, lat, lon) float32 dask.array<chunksize=(1, 26, 192, 288), meta=np.ndarray>
OMEGAT (time, lev, lat, lon) float32 dask.array<chunksize=(1, 26, 192, 288), meta=np.ndarray>
P0 (time) float64 dask.array<chunksize=(977,), meta=np.ndarray>
PBLH (time, lat, lon) float32 dask.array<chunksize=(1, 192, 288), meta=np.ndarray>
PHIS (time, lat, lon) float32 dask.array<chunksize=(1, 192, 288), meta=np.ndarray>
PRECC (time, lat, lon) float32 dask.array<chunksize=(1, 192, 288), meta=np.ndarray>
PRECL (time, lat, lon) float32 dask.array<chunksize=(1, 192, 288), meta=np.ndarray>
PRECSC (time, lat, lon) float32 dask.array<chunksize=(1, 192, 288), meta=np.ndarray>
PRECSL (time, lat, lon) float32 dask.array<chunksize=(1, 192, 288), meta=np.ndarray>
PS (time, lat, lon) float32 dask.array<chunksize=(1, 192, 288), meta=np.ndarray>
PSL (time, lat, lon) float32 dask.array<chunksize=(1, 192, 288), meta=np.ndarray>
Q (time, lev, lat, lon) float32 dask.array<chunksize=(1, 26, 192, 288), meta=np.ndarray>
QFLX (time, lat, lon) float32 dask.array<chunksize=(1, 192, 288), meta=np.ndarray>
QREFHT (time, lat, lon) float32 dask.array<chunksize=(1, 192, 288), meta=np.ndarray>
QRL (time, lev, lat, lon) float32 dask.array<chunksize=(1, 26, 192, 288), meta=np.ndarray>
QRS (time, lev, lat, lon) float32 dask.array<chunksize=(1, 26, 192, 288), meta=np.ndarray>
RELHUM (time, lev, lat, lon) float32 dask.array<chunksize=(1, 26, 192, 288), meta=np.ndarray>
SFCLDICE (time, lat, lon) float32 dask.array<chunksize=(1, 192, 288), meta=np.ndarray>
SFCLDLIQ (time, lat, lon) float32 dask.array<chunksize=(1, 192, 288), meta=np.ndarray>
SHFLX (time, lat, lon) float32 dask.array<chunksize=(1, 192, 288), meta=np.ndarray>
SNOWHICE (time, lat, lon) float32 dask.array<chunksize=(1, 192, 288), meta=np.ndarray>
SNOWHLND (time, lat, lon) float32 dask.array<chunksize=(1, 192, 288), meta=np.ndarray>
SOLIN (time, lat, lon) float32 dask.array<chunksize=(1, 192, 288), meta=np.ndarray>
SWCF (time, lat, lon) float32 dask.array<chunksize=(1, 192, 288), meta=np.ndarray>
T (time, lev, lat, lon) float32 dask.array<chunksize=(1, 26, 192, 288), meta=np.ndarray>
TAUX (time, lat, lon) float32 dask.array<chunksize=(1, 192, 288), meta=np.ndarray>
TAUY (time, lat, lon) float32 dask.array<chunksize=(1, 192, 288), meta=np.ndarray>
TGCLDCWP (time, lat, lon) float32 dask.array<chunksize=(1, 192, 288), meta=np.ndarray>
TGCLDIWP (time, lat, lon) float32 dask.array<chunksize=(1, 192, 288), meta=np.ndarray>
TGCLDLWP (time, lat, lon) float32 dask.array<chunksize=(1, 192, 288), meta=np.ndarray>
TMQ (time, lat, lon) float32 dask.array<chunksize=(1, 192, 288), meta=np.ndarray>
TREFHT (time, lat, lon) float32 dask.array<chunksize=(1, 192, 288), meta=np.ndarray>
TS (time, lat, lon) float32 dask.array<chunksize=(1, 192, 288), meta=np.ndarray>
TSMN (time, lat, lon) float32 dask.array<chunksize=(1, 192, 288), meta=np.ndarray>
TSMX (time, lat, lon) float32 dask.array<chunksize=(1, 192, 288), meta=np.ndarray>
U (time, lev, lat, lon) float32 dask.array<chunksize=(1, 26, 192, 288), meta=np.ndarray>
U10 (time, lat, lon) float32 dask.array<chunksize=(1, 192, 288), meta=np.ndarray>
UU (time, lev, lat, lon) float32 dask.array<chunksize=(1, 26, 192, 288), meta=np.ndarray>
V (time, lev, lat, lon) float32 dask.array<chunksize=(1, 26, 192, 288), meta=np.ndarray>
VD01 (time, lev, lat, lon) float32 dask.array<chunksize=(1, 26, 192, 288), meta=np.ndarray>
VQ (time, lev, lat, lon) float32 dask.array<chunksize=(1, 26, 192, 288), meta=np.ndarray>
VT (time, lev, lat, lon) float32 dask.array<chunksize=(1, 26, 192, 288), meta=np.ndarray>
VU (time, lev, lat, lon) float32 dask.array<chunksize=(1, 26, 192, 288), meta=np.ndarray>
VV (time, lev, lat, lon) float32 dask.array<chunksize=(1, 26, 192, 288), meta=np.ndarray>
Z3 (time, lev, lat, lon) float32 dask.array<chunksize=(1, 26, 192, 288), meta=np.ndarray>
ch4vmr (time) float64 dask.array<chunksize=(1,), meta=np.ndarray>
co2vmr (time) float64 dask.array<chunksize=(1,), meta=np.ndarray>
date (time) int32 dask.array<chunksize=(1,), meta=np.ndarray>
date_written (time) |S8 dask.array<chunksize=(1,), meta=np.ndarray>
datesec (time) int32 dask.array<chunksize=(1,), meta=np.ndarray>
f11vmr (time) float64 dask.array<chunksize=(1,), meta=np.ndarray>
f12vmr (time) float64 dask.array<chunksize=(1,), meta=np.ndarray>
gw (time, lat) float64 dask.array<chunksize=(1, 192), meta=np.ndarray>
hyai (time, ilev) float64 dask.array<chunksize=(1, 27), meta=np.ndarray>
hyam (time, lev) float64 dask.array<chunksize=(1, 26), meta=np.ndarray>
hybi (time, ilev) float64 dask.array<chunksize=(1, 27), meta=np.ndarray>
hybm (time, lev) float64 dask.array<chunksize=(1, 26), meta=np.ndarray>
mdt (time) int32 dask.array<chunksize=(977,), meta=np.ndarray>
n2ovmr (time) float64 dask.array<chunksize=(1,), meta=np.ndarray>
nbdate (time) int32 dask.array<chunksize=(977,), meta=np.ndarray>
nbsec (time) int32 dask.array<chunksize=(977,), meta=np.ndarray>
ndbase (time) int32 dask.array<chunksize=(977,), meta=np.ndarray>
ndcur (time) int32 dask.array<chunksize=(1,), meta=np.ndarray>
nlon (time, lat) int32 dask.array<chunksize=(1, 192), meta=np.ndarray>
nsbase (time) int32 dask.array<chunksize=(977,), meta=np.ndarray>
nscur (time) int32 dask.array<chunksize=(1,), meta=np.ndarray>
nsteph (time) int32 dask.array<chunksize=(1,), meta=np.ndarray>
ntrk (time) int32 dask.array<chunksize=(977,), meta=np.ndarray>
ntrm (time) int32 dask.array<chunksize=(977,), meta=np.ndarray>
ntrn (time) int32 dask.array<chunksize=(977,), meta=np.ndarray>
sol_tsi (time) float64 dask.array<chunksize=(1,), meta=np.ndarray>
time_bnds (time, nbnd) object dask.array<chunksize=(977, 2), meta=np.ndarray>
time_written (time) |S8 dask.array<chunksize=(1,), meta=np.ndarray>
w_stag (time, slat) float64 dask.array<chunksize=(1, 191), meta=np.ndarray>
wnummax (time, lat) int32 dask.array<chunksize=(1, 192), meta=np.ndarray>
Attributes:
Conventions: CF-1.0
Version: $Name$
case: f09_g16.B.CONTROL03
host: n716005
initial_file: f09_g16.B.04.EXT.cam.i.2075-01-01-00000.nc
logname: clmfrm01
revision_Id: $Id$
source: CAM
title: UNSET
topography_file: /d/04/clmfrm/din/atm/cam/topo/USGS-gtopo30_0.9x1.25_rem...
The output of this cell reveals that data represents 977 months, and is further indexed by latitude, longitude, and atmospheric level. There are 114 data variables available, and full metadata is accessible by clicking the icon that looks like a sheet of paper at the right side of each variable.
Suppose that we're interested in the net longwave flux at the top of the model. The variable description above tells us that the relevant variable is "FLNT". Let's create an interactive plot of FLNT across the entire globe for all nine cases.
We're going to use the GeoViews library to create the interactive plot, so we'll import it into our iPython session. We'll set the backend renderer to 'bokeh', a utility which renders images for the web.
Now, we'll prepare the plots of FLNT. We have to specify that time, lat, and lon are the "key" dimensions, while FLNT is the "value" dimension of interest. To generate the plots, we simply use the gv.Dataset.to() function and apply a few optioins to specify an orthographic projection and render the coastline.
The GeoViews DynamicMap this generated allows you to move a slider to examine plot the FLNT at certain times. (If this notebook is a web page instead of a live server, that behavior won't work). Whenever we change the slider, the data is automatically fetched for all nine cases, which takes several seconds -- far faster than extracting the data by hand!
<matplotlib.collections.QuadMesh at 0x7f0008fc1490>
xarray objects like flnt_today have built-in plotting functionality which uses matplotlib. When we want more advanced, functionality, like plotting a stereographic projection from the North Pole, we can call on the matplotlib tools themselves. Here we'll set up a set of figure axes to use a stereographic projection, plot the coastline, and plot flnt_today against the axes.
In [11]:
fig=plt.figure(figsize=[10,10])ax=plt.axes(projection=crs.NorthPolarStereo())ax.set_extent([-180,180,65,90],crs.PlateCarree())ax.add_feature(cartopy.feature.COASTLINE)ax.gridlines()# Compute a circle in axes coordinates, which we can use as a boundary# for the map. We can pan/zoom as much as we like - the boundary will be# permanently circular.theta=np.linspace(0,2*np.pi,100)center,radius=[0.5,0.5],0.5verts=np.vstack([np.sin(theta),np.cos(theta)]).Tcircle=mpath.Path(verts*radius+center)ax.set_boundary(circle,transform=ax.transAxes)flnt_today.plot(ax=ax,transform=crs.PlateCarree())
Out[11]:
<matplotlib.collections.QuadMesh at 0x7f00225e9f70>
Ta-da! We've seen a couple different ways to produce figures and explore data quickly on the cloud with zarr, xarray, GeoViews, and matplotlib.